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ABSTRACT 


The present era is the modern technology evolving era for cybersecurity. It boons a dynamic battlefield 
for cyber security concerns for security experts. Network intrusions have become a major concern in 
cyberspace for compromising security. Traditional methods like manual rules, blacklists, and whitelists 
are insufficient for detecting modern intrusions. While machine learning approaches for intrusion 
detection have emerged, many suffer from low accuracy. However, recent advances in machine learning 
algorithms show promise for improving intrusion detection and classification. To address the limitations 
of current methods, this work proposes a hybrid machine learning approach for intrusion detection and 
classification. The approach utilizes seven classifiers including decision tree, random forest, naive 
Bayes, ADA, XGB, KNN, and logistic regression. The model is evaluated on the CICIDS2017 dataset 
using training and testing splits. The classifiers achieve accuracy rates of 0.99 for decision tree, 0.96 for 
random forest, 0.85 for naive Bayes, 0.97 for ADA, 0.96 for XGB, 0.98 for KNN, and 0.91 for logistic 
regression. The decision tree classifier demonstrates the highest accuracy of 0.99, owing to its effective 
parametric function evaluation and ability to minimize misclassification errors. The proposed hybrid 
approach aims to advance network intrusion detection and classification capabilities beyond current 
techniques. 


Keywords: /ntrusion Detection, Machine Learning, Decision Tree, KNN, Random Forest, Naive Bayes, 
ADA, XGB 


1. INTRODUCTION 


A network intrusion refers to a computer security 
incident where an unauthorized attacker gains access 
to a computer network or resources accessible 
through it. Such intrusions are executed through 
various means, including exploiting vulnerabilities 


in network software/devices, using stolen 
credentials, or employing social engineering tactics. 
The objectives behind intrusions vary, 


encompassing actions like data theft, network 
disruption, or leveraging the compromised network 


for subsequent attacks. Detecting and preventing 
network intrusions holds paramount importance for 
network security [1]. 


Network intrusions pose a serious threat in 
today's interconnected world, enabling unauthorized 
access and potentially catastrophic damages. As 
cyberattacks grow more sophisticated, traditional 
intrusion detection techniques fail to keep pace. 
Manual signature-based methods cannot identify 
new attack patterns. Machine learning has shown 
potential but current ML intrusion detection models 
exhibit limitations. 
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With the rapid advancement of IT technology, 
communication across diverse networks among 
individuals and organizations globally has surged. 
However, this growth has brought forth security 
challenges. The sheer volume of network traffic data 
makes it intricate to differentiate between normal 
and intrusive data traffic, particularly when facing 
novel, previously unseen attack types. Moreover, 
attacks constantly evolve, making it arduous to keep 
intrusion detection systems current. 


Numerous factors contribute to computer system 
and network intrusions [2]: 
1. Weak or easily guessable passwords: Users 
opting for weak passwords or reusing them across 
accounts facilitate unauthorized access. 
2. Software vulnerabilities: Unpatched software 
vulnerabilities provide opportunities for attackers to 
exploit systems. 
3. Social engineering: Deceptive tactics like 
phishing emails or misleading calls trick users into 
revealing sensitive data or falling for malicious 
links. 
4. Inadequate security controls: Absence of 
appropriate security measures such as firewalls, 
intrusion prevention systems, and antivirus makes 
systems susceptible. 
5. Insider threats: Privileged insiders misusing 
their access to engage in malicious activities. 
6. Insecure IoT devices: Attackers exploiting 
vulnerable Internet of Things devices to breach 
networks or launch DDoS attacks. 
7. Supply chain attacks: Targeting third-party 
vendors providing components for target systems. 


Diverse approaches tackle network intrusion 
detection, including: Signature-based detection [3]: 
Matching network traffic against known malicious 
patterns or "signatures" to identify familiar attacks. 
Anomaly-based detection [4]: Identifying unusual 
network activity patterns deviating from normal 
behaviour and flagging them as potentially 
malicious. Hybrid methods [5]: Combining 
signature and anomaly-based approaches [6]. 

However, the complexity of transactions has 
increased due to essential services being provided by 
cloud data centers. This growth in volume poses 
challenges for more sophisticated detection 
algorithms [7]. Intrusion Detection Systems (IDS) 
have become vital tools to safeguard large business 
networks. They usually complement other security 
tools like firewalls, authentication servers, and 
antivirus software to bolster overall network security 


[8]. 
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Rapid technological advancements have turned 
cybersecurity into a pressing concern. The 
widespread use of mobile devices has led to a surge 
in cyberattacks and cybercrime. The antiphishing 
working group's report indicates the discovery of 
over 20 million new malware instances daily, with 
approximately 227,000 daily malware detections [9]. 
Dealing with this escalated threat has become 
challenging due to the increasing frequency and 
sophistication of malware attacks [10]. 


Traditional network intrusion detection often 
relies on signs or rules, demanding manual 
calculation for each attack type and detection based 
on signatures or security rules. However, these 
methods have shown inadequacy in handling the 
rapid network expansion, as well as attacks 
characterized by greater volume, complexity, and 
volatility [11]. In light of these ongoing challenges, 
a novel approach has been introduce, a hybrid 
machine learning methodology designed to identify 
and categorize network intrusions. This hybrid 
approach incorporates seven distinct classifiers, 
namely: decision tree, random forest, naive Bayes, 
ADABoost, XGBoost, k-nearest neighbors, and 
logistic regression. To validate the efficacy of this 
approach, it underwent rigorous testing and training 
procedures using the CICIDS2017 dataset. 


This work develops a hybrid machine learning 
model for advancing network intrusion detection by 
strategically combining diverse optimized 
classifiers. Traditional techniques and existing 
machine learning approaches exhibit limitations in 
accuracy and effectiveness against modern threats. 
Recent advancements in algorithms show potential 
if applied innovatively. The model selectively 
integrates seven classifiers, capitalizing on their 
complementary strengths. Systematic optimization 
and testing demonstrates superior accuracy over 
individual classifiers. By harnessing innovations in 
machine learning, the hybrid approach aims to 
enhance intrusion detection capabilities beyond 
current techniques against continuously evolving 
threats. 


The study offers several noteworthy 
contributions that underscore its significance in the 
field of cybersecurity and intrusion detection: 

° A novel machine learning system that 
swiftly detects potential network breaches. By 
blending different methods, it's highly accurate and 
adaptable in real-time. 

° Our research significantly improves breach 
detection accuracy using a mix of methods that 
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outperforms older techniques. 

° Our model minimizes false alerts, ensuring 
security experts focus on genuine threats rather than 
distractions. 

° It swiftly learns to handle new attack 
methods, providing proactive defense in the ever- 
changing cyber threat landscape. 

The remaining paper is organized as related work 
in section 2, methodology in section 3, results and 
discussion in section 4 and conclusion in in section 
5. 


2. RELATED WORK 


This section establishes the fundamental 
concepts and explanations necessary for grasping 
intrusion detection and its associated methodologies. 
Over recent years, a multitude of models and 
approaches stemming from traditional machine 
learning techniques have surfaced, all focused on the 
critical objective of recognizing and categorizing 
network intrusions. Notable examples encompass 
the support vector machine (SVM) [12], random 
forest (RF) [13], decision tree (DT) [14], artificial 
neural network (ANN) [15], logistic regression (LR) 
[16], and naïve Bayes [17]. 


In a recent study [18], a multiclass classification 
technique named average one dependence estimator 
was introduced for intrusion detection and 
classification. It exhibited an accuracy of 83.47% 
and a false alarm rate (FAR) of 6.57% on the 
UNSW-NB15 dataset, utilizing a limited set of 
features. Another investigation proposed a random 
forest algorithm [19] on the UNSW-NB15 dataset, 
achieving an accuracy of 82% and a FAR of 4.4% 
with the utilization of five selected features. 


A machine learning approach employing 
recurrent neural networks (RNN) [20] was applied to 
the KDD99 dataset, demonstrating an accuracy of 
97.9% and a FAR of 0.5%. However, this method 
falls short of delivering a significantly impactful 
accuracy for robust intrusion detection and 
classification. Conversely, a technique based on a 
random forest classifier [21] yielded an accuracy 
value of 97.85% but suffered from a high FAR value 
of 2.15%, indicating inefficacy in detecting 
sophisticated intrusion attacks. 


Furthermore, a framework [22] built upon 
principle component analysis (PCA) and the binary 
Gaussian model was introduced for detecting cyber- 
attacks in mobile cloud environments. However, the 


E-ISSN: 1817-3195 


testing process of this methodology lacks clarity in 
terms of achieving the main outcomes. Table 1 
elucidates the assortment of existing intrusion 
detection methods along with pertinent features as 
outlined below. This table furnishes a 
comprehensive overview of diverse intrusion 
detection techniques, encompassing their respective 
outcomes, methodologies, challenges, and notable 
attributes. The primary objective of these 
methodologies is to fortify network security by 
accurately discerning and classifying distinct types 
of network traffic. The table accentuates the 
importance of adopting diverse approaches to attain 
heightened accuracy rates while effectively 
addressing specific challenges. 


Table 1: Comparative Analysis of Intrusion Detection 
Methods 


nce 
2021 Advanc | Accurac | © More 
[23] ed tree- | y:95.95 | datasets can be 
based % used like 
machine 
learning . UNSW- 
techniq NB15 
Eee ° CICIDS2 
017 
2021 Reduce | Accurac | © UNSW- 
[18] d Error | y:97.94 | NB15 dataset 
Pruning %, used 
Tree 
FAR: e Not used 
0.000 | all the features, 
reduced some 
features from 44 
to 20 
2020 Accurac | © Only 
[24] y deals with two 
83.63%, | types of packets 
98.23%, 
97.81%, Normal 
95 13% Encrypte 
respecti 
vely 
2017 | Random | Accurac | e No real 
[19] forest y:82% | time dataset used 
algorith 
= (FAR) | e Two-way 
of 4.4% | training may 
increase accuracy. 
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Not used all the 
features, work on 
limited features 


Accurac 
y:95% 


The Table 1 showcases a diverse range of 
approaches employed in intrusion detection, each 
contributing unique insights and results to the field. 
By examining their methodologies, outcomes, and 
challenges, researchers and practitioners can make 
informed decisions about the most suitable 
techniques for enhancing network security based on 
the specific context and requirements. Future studies 
may further build upon these methods to achieve 
even higher accuracy rates and address emerging 
challenges in network intrusion detection. 


3. METHODOLOGY 


The methodology describes the proposed hybrid 
ML approach for network intrusion detection and 
classification. The hybrid approach architecture with 
its parametric functionality is given below. 


3.1 Proposed Hybrid Machine 
Techniques 


Learning 


The network intrusion detection process is 
detected and classified using a variety of machine 
learning techniques included in the suggested model. 
For the detection and categorization of intrusions, 
nearly seven ML classifiers (DT, RF, NB, ADA, 
XGB, KNN, and Logistic regression) are employed. 
For the required dataset for the intrusion detection 
and classification procedure, these models used 
various parameters and processes to provide results. 
The workflow of model and classifier details are 
given in Figure 1. 


The central hypothesis is that a hybrid intrusion 
detection model strategically integrating multiple 
complementary machine learning algorithms trained 
on diverse real-world data will achieve significantly 
higher accuracy, adaptability, and generalizability 
compared to individual or narrowly focused 
ensemble models. The proposed solution is an 
intrusion detection system that combines optimized 
implementations of algorithms including decision 
trees, random forests, SVM, neural networks, and 
naïve Bayes. This diversity aims to mitigate the bias, 
overfitting, and limitations on novel attack detection 
faced by reliance on singular classifiers or outdated 
synthetic datasets. The hybrid model’s integration of 
selective, optimized, and diverse classifiers is 
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predicted to align with the complexity of modern 
threats, transforming recent ML advancements into 
deployable defenses with the accuracy and 
adaptability needed for the dynamic real-world 
threat landscape. 


3.1.1 Decision tree 


To employ the decision tree machine learning 
technique [26] for the detection and classification of 
network intrusions on the CISIDS2017 dataset, 
firstly, gather the CISIDS2017 dataset, containing 
network traffic data and corresponding intrusion 
labels. Preprocess the dataset by cleaning and 
normalizing the data, ensuring it is in a suitable 
format for analysis. Split the dataset into training and 
testing sets. Next, select relevant features that 
contribute to intrusion detection and classification. 
Apply feature selection techniques to eliminate 
redundant or irrelevant features. Train the decision 
tree model using the training dataset, allowing it to 
learn optimal decision rules based on the features 
and intrusion labels. Evaluate the model's 
performance by testing it on the separate testing 
dataset, calculating metrics like accuracy, precision, 
recall, and F1 score. Fine-tune the decision tree 
parameters and feature selection to enhance the 
model performance. Finally, deploy the trained 
decision tree model for real-time intrusion detection 
and classification by providing network traffic data 
as input and obtaining the predicted intrusion labels 
as output. 


3.1.2 Random forest 


To utilize Random forest [27] for network 
intrusion detection on CISIDS2017 dataset gather, 
pre-process, and split the dataset into training and 
testing sets. The relevant features are selected 
eliminate redundancy or irrelevancy. Then construct 
a Random forest model with an ensemble of decision 
trees and train the model on the training dataset, 
evaluate performance on the testing set. In the end 
adjust parameters (e.g., number of trees, maximum 
depth) for optimization and evaluation of the model. 


3.1.3 Naive bayes 


Naive Bayes [28] is a ML technique used for 
intrusion detection and classification. The workflow 
involves dataset preparation, including cleaning and 
normalization, and splitting into training and testing 
sets [29]. Relevant features are selected and 
redundant ones are eliminated. The Naive Bayes 
model is trained using the training data, estimating 
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performance is evaluated using metrics like accuracy 
and precision [30]. 


3.1.4 AdaBoost 


AdaBoost, or Adaptive Boosting, is a machine 
learning technique used for intrusion detection and 
classification [31].The workflow involves dataset 
preparation, feature selection, and the construction 
of an ensemble model using weak classifiers. This 
model is trained by assigning weights to the weak 
classifiers based on their performance and focusing 
on misclassified instances in subsequent iterations. 
The model performance is evaluated using metrics 
such as accuracy and precision. Parameters can be 
fine-tuned to optimize performance. The trained 
model is deployed for real-time intrusion detection 
and classification, and continuous monitoring and 
updates are necessary to ensure its effectiveness. Its 
ability to adaptively weight weak classifiers makes 
it a powerful approach for intrusion detection tasks. 


3.1.5 XGBoost 


XGBoost, or Extreme Gradient Boosting [32] 1s 
a powerful ML technique for intrusion detection and 
classification. The workflow involves dataset 
preparation, feature selection, and the construction 
of an ensemble model using decision trees. The 
XGBoost model is trained by correcting errors from 
previous trees and learning from gradients. 
Performance is evaluated using metrics like 
accuracy and precision. Parameters are fine-tuned 
for optimization. Its ability to handle complex 
relationships and utilize optimization techniques 
make it valuable for intrusion detection and 
classification. 


3.1.6 KNN 


K-Nearest Neighbors (KNN) is a machine 
learning technique used for intrusion detection and 
classification. This approach involves dataset 
preparation, feature selection, and the construction 
of a KNN model. The model assigns each data point 
to its K nearest neighbors based on a chosen distance 
metric. The KNN model is trained using a training 
dataset and evaluated using different statistical 
metrics. The model's performance can be optimized 
by selecting an appropriate value for K and fine- 
tuning the distance metric or other parameters [33]. 


3.1.7 Logistic regression 


detection and classification. The workflow involves 
dataset preparation, feature selection, and training 
the model using the logistic regression algorithm. 
Model performance is evaluated using metrics like 
accuracy and precision. Parameters can be fine- 
tuned for optimization. The trained model is 
deployed for real-time intrusion detection, and 
continuous monitoring and updates are necessary to 
maintain its effectiveness. Its interpretability and 
simplicity make it a valuable tool for intrusion 
detection and classification tasks. 
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Figure 1: Workflow of Proposed Model 


3.2 Dataset 


The CICIDS2017 dataset has been growing since 
its inception attracting researchers for model and 
algorithm study and development process. It 
spanned eight separate files containing attacks traffic 
data, according to the author [34] of CICIDS2017. 
The consolidated dataset of CICIDS2017 with 
1182213 instances after eliminating those missing 
instances. Surprisingly, no duplicate instances were 
discovered. Table 2 shows the characteristics of the 
combined dataset as well as the comprehensive 
class-wise occurrence. 


Table 2: CISIDS2017 Dataset 
Class Label Class Name 


0 Benign 


Web attack- XSS 
Infiltration 


l 
2 
3 
4 
5 
6 
7 


Web attack-sql injection 


3.3 Data Preprocessing 
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To obtain the desired dataset shape, data pre- 
processing is applied to the entire dataset. The 
normalization procedure is used to balance the types 
of datasets and the quantity of values needed for 
additional processing. This phase also includes 
feature selection and dataset cleansing. For testing 
and training purposes, the dataset is split into test 
30% and training 70% portions. There are 8 needed 
classes in total, and Table 2 lists the number of 
instances for each class. 


3.4 Extraction of Features 


The pre-processing task of feature extraction 
comprises selecting specific linked structures to 
create training and testing datasets for the algorithm. 
This achieves several objectives: it lessens the 
chance of overfitting, simplifies analysis, and raises 
the possibility of generalization. The model will take 
longer to train and test due to inefficient feature 
extraction because it must process more data than is 
necessary. 


3.4.1 Training 


At this stage, a model is generated by feeding the 
training dataset from the data pre-processing phase 
into the chosen ML algorithm. Depending on the 
methodology, this step could be repeated several 
times or only once. 


3.4.2 Testing 


Once the model has been developed, the test data 
from the pre-processing phase is given into the 
model structure identical to that of the training 
dataset. How accurately the classification or 
prediction is made is determined. The more accurate 
a model is, the closer it gets to being 100% accurate. 
Of course, it is statistically challenging to create a 
model with perfect accuracy when the size of the 
data increases and the model is modified. 


4. RESULTS AND DISCUSSION 


The result section describe the overall results 
during the training and testing phases. It also explain 
the statistical evaluation metrics and results with 
brief explanation. 


4.1 Evaluation Measures 


The proposed model utilizes various machine 
learning classifiers for the purpose of network 
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intrusion detection and classification. The classifier 
values are determined using specific statistical 
equations presented in equations 1-4. The terms true 
Positive (tp), true negative (tn), false positive (fp), 
and false negative (fn) represent the correct 
classification of positive and negative instances. To 
evaluate the model's output, statistical measures 
such as recall (R), accuracy rate (Acc), and precision 
(Pre) are employed. In cases where there is a trade- 
off between accuracy and recall, the Fl-score is 
utilized to assess the overall performance of the 
model. 


_ P 1 
= tp + fp (1) 
tp 
R= 2 
tp + fp (2) 
a 2(R * Pre) 
an — i + Pre (3) 
tp + tn 
Acc = ——————_| (4) 
tp + tn + fp + fn 
4.2 Model Evaluation 
© Decision tree: The evaluation of decision 


tree metrics for training and testing, and accuracy 
values of 0.99 are given in Table 3 and confusion 
matrix in Figure 2 for test dataset. It shows that the 
most common type of cyber-attack is benign traffic. 
However, there are also a significant number of 
portscan, bot, brute force, infiltration, XSS, and SQL 
injection attacks. It is important to be aware of these 
different types of cyberattacks in order to protect 
your computer systems. The number of cyberattacks 
has been increasing over time. This is likely due to 
the increasing sophistication of cyber attackers and 
the increasing number of connected devices. The 
number of benign traffic has also been increasing 
over time. This is likely due to the increasing amount 
of data being transmitted over the internet. The 
number of portscan attacks has remained relatively 
constant over time. This is likely because portscan 
attacks are a relatively simple type of attack that can 
be easily automated. The bot attacks has been 
increasing over time. This is likely due to the 
increasing use of botnets to launch other types of 
cyberattacks. 


The brute force attacks has been decreasing over 
time. This is likely due to the increasing use of 
stronger passwords and multi-factor authentication. 
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The number of infiltration attacks has been most common type of traffic. This is because 


increasing over time. This is likely due to the 
increasing sophistication of cyber attackers and the 
increasing number of vulnerabilities in computer 
systems. The XSS attacks has been decreasing over 
time. This is likely due to the increasing use of web 
application firewalls and other security measures. 
The SQL injection attacks has been decreasing over 
time. This is likely due to the increasing use of 
database security measures. 


Benign 
Portscan- 
DDoS- 
Bot- 


Brute force- 


XSS- 


Infiltration 


sql injection- 0 0 i) 0 0 1351 2 


Benign Portscan DDoS Bot Brute force XSS 


Figure 2: Decision Tree Confusion Matrix 


e Random Forest: The evaluation of 
random forest metrics for training and testing, and 
accuracy values of 0.96 are given Table 3 and 
confusion matrix in Figure 3 for test dataset. It shows 
the comparative analysis of the different types of 
cyberattacks shows that benign traffic is the most 
common type of traffic. This is because most of the 
traffic on the internet is benign. Portscan traffic is the 
second most common type of traffic. This is because 
portscan attacks are often used to scan for 
vulnerabilities in computer systems. Bot traffic is the 
third most common type of traffic in the first, but it 
decreases to the fifth most common type of traffic. 
This is because botnets are often used to launch other 
types of cyberattacks. Brute force attacks are the 
fourth most common type of traffic, but it decreases 
to the seventh most common type of traffic. This is 
because brute force attacks are often used to guess 
passwords. Infiltration attacks are the fifth most 
common type of traffic, but it increases to the third 


infiltration sql in 


infiltration attacks are often used to gain 
unauthorized access to computer systems. SQL 
injection attacks are the sixth most common type of 
traffic. This is because SQL injection attacks can be 
used to steal data from databases. 


Benign geen ts 0 0 0 0 0 


Portscan- 


DDoS- 0 


Bot 0 


Brute force 0 


21 


XSS 0 


Infiltration 0 0 0 0 0) l 


sql injection 0 0 0 0 0 944 3 


Benign Portscan DDoS Bot Brute force XSS 


Figure 3: Naïve Baye Confusion Matrix 


e Naive Baye: The evaluation of naive baye 
metrics for training and testing, and accuracy values 
of 0.85 are given Table 3 and confusion matrix in 
Figure 4 for test dataset. It shows the experimental 
values of different types of cyberattacks detected by 
a security system over a period of time. It shows that 
the number of benign traffic has increased from 8849 
to 10791. This is likely due to the increasing amount 
of data being transmitted over the internet. The 
number of portscan attacks has also increased from 
6 to 10660. This is likely due to the increasing 
sophistication of cyber attackers and the increasing 
number of vulnerabilities in computer systems. The 
number of bot attacks has decreased from 192 to 
9209. This is likely due to the increasing use of 
botnets to launch other types of cyberattacks. The 
number of brute force attacks has remained the same 
at 0. This is likely due to the increasing use of 
stronger passwords and multi-factor authentication. 
The number of infiltration attacks has increased from 
263 to 533. This is likely due to the increasing 
sophistication of cyber attackers and the increasing 
number of vulnerabilities in computer systems. The 
number of SQL injection attacks has decreased from 
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639 to 7891. This is likely due to the increasing use 
of database security measures. 


Benign 0 0 
Portscan- 4 0 
DDoS- 0 0 


Bot- 


Brute force- 28 21 


XSS- 0 0 0 0 0 1044 1040 
Infiltration - 0 0 0 0 0 0 


sql injection- 0 0 0 0 0 296 533 


Benign Portscan DDoS Bot Brute force XSS 


Figure 4: Naïve Baye Confusion Matrix 


© ADA Classifier: The evaluation of ADA 
Classifier metrics for training and testing, and 
accuracy values of 0.86 are given Table 3 and 
confusion matrix in Figure 5 for test dataset. It shows 
the number of different types of cyberattacks 
detected by a security system over a period of two 
months. The most common type of cyber-attack is 
benign traffic, followed by portscan traffic. The 
number of bot, brute force, infiltration, and SQL 
injection attacks are much lower. It also indicate that 
the number of benign traffic has increased, while the 
number of bot, brute force, infiltration, and SQL 
injection attacks have decreased. This is likely due 
to the increasing use of security measures to protect 
computer systems. 


infiltration sq 
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Brute force- 


XSS- 


Infiltration 


sql injection- 
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Benign Portscan DDoS Bot Infiltration — sql injection 


Figure 5: ADA Classifier Confusion Matrix 


e XGB Classifier: The evaluation of XGB 
Classifier metrics for training and testing, and 
accuracy values of 0.96 are given Table 3 and 
confusion matrix in Figure 6 for test dataset. The 
figure shows the experimental values of different 
types of cyberattacks detected by a security system. 
The most common type of cyber-attack is benign 
traffic, followed by portscan traffic. The number of 
bot, brute force, infiltration, and SQL injection 
attacks are much lower. The number of benign traffic 
is 10643, followed by portscan traffic with a value 
of 10714. The number of bot, brute force, 
infiltration, and SQL injection attacks are much 
lower, with values of 0, 0, 1, and 0 respectively. 


Benign 
Portscan- 
DDoS- 

Bot- 

Brute force- 
KSS- 
Infiltration - 


sql injection- 0 0 0 0 0 1215 


Benign Portscan DDoS Bot Brute force XSS 


Figure 6: XGB Classifier Confusion Matrix 


e KNN Classifier: The evaluation of KNN 
Classifier metrics for training and testing, and 
accuracy values of 0.98 are given Table 3 and 
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confusion matrix in Figure 7 for test dataset. The 
confusion matrix in the Figure 7 shows the true and 
predicted classifications of the model. The model 
correctly classified 10643 instances of benign traffic 
and 10714 instances of portscan traffic. The model 
misclassified 1 instance of brute force traffic as 
infiltration traffic. 
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Figure 7: KNN Classifier Confusion Matrix 


e Lgr Classifier: The evaluation of Lgr 
Classifier metrics for training and testing, and 
accuracy values of 0.91 are given Table 3 and 
confusion matrix in Figure 8 for test dataset. The 
number of benign traffic has increased from 10643 
to 10791 due to the increasing amount of data being 
transmitted over the internet. The number of 
portscan attacks has decreased from 10714 to 0. This 
is likely due to the increasing use of security 
measures to protect computer systems. The number 
of bot attacks has also decreased from 0 to 0. This is 
likely due to the increasing use of security measures 
to protect computer systems. The number of brute 
force attacks has decreased from 1 to 0. This is likely 
due to the increasing use of strong passwords and 
multi-factor authentication. The number of 
infiltration attacks has increased from 0 to 1. This is 
likely due to the increasing sophistication of cyber 
attackers and the increasing number of 
vulnerabilities in computer systems. The number of 
SQL injection attacks has remained the same at 0. 
This is likely due to the increasing use of database 
security measures. 
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Figure 8: Logistic Regression Classifier Confusion 
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decision tree classifier shows the highest accuracy of 
0.99 equal to 99%.The naïve baye demonstrate the 
lowest accuracy which is 0.85 equal to 85%.The 
decision tree highest accuracy is due to the 
parameters and filters used in the model evaluation. 
Overall, the Decision Tree classifier is the most 
accurate classifier in the table. The Random Forest 
classifier is also a very accurate classifier, and it is a 
good choice if you want to avoid overfitting. The 
Naive Bayes classifier is the least accurate classifier 
in the table, but it is a simple and efficient classifier 
that can be used for small datasets. The lowest 
accuracy of naive baye is due to the miss 
classification occur during the model evaluation. 
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Table 5: Accuracy Comparison with State-Of-The-Art 
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The Table 5 presents the results comparison with 
the state-of-the-art techniques according to the 
results, the Autoencoder achieves an accuracy of 
80%. The Variational Autoencoder performs better 
with an accuracy of 89.08%. Long-term short term 
memory (LSTM) models show promising results, 
with one LSTM model achieving an accuracy of 
98% and another LSTM model achieving an 
accuracy of 97.58%.The Decision Tree model 
demonstrates a high accuracy of 98%, indicating its 
effectiveness in detecting and classifying network 
intrusions. Support Vector Machine (SVM) models 
also exhibit strong performance, with one SVM 
model achieving an accuracy of 98.76% and another 
SVM model achieving an accuracy of 
97.7%.Comparing the different methodologies, it 
can be observed that the SVM models generally 
achieve higher accuracies compared to the 
Autoencoder, Variational Autoencoder, and LSTM 
models. The Decision Tree model also performs 
well, showing similar accuracy 99% levels to the 
SVM models. Overall, the comparison highlights the 
varying levels of accuracy achieved by different 

The results in the Table 4 shows the comparison methodologies, providing insights into their 
of classifier results on CICIDS2017 dataset. The 
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performance for 
classification tasks. 


intrusion detection and 


5. CONCLUSION 


This study introduces a hybrid machine learning 
approach for the purpose of network intrusion 
detection and classification. The proposed approach 
incorporates seven classifiers, namely DT, RF,NB, 
ADA, XGB, KNN, and Lr, to evaluate the model's 
performance. The evaluation of the model is 
conducted on the CICIDS2017 dataset after pre- 
processing. The accuracy rates obtained for each 
classifier are as follows: decision tree: 0.99, random 
forest: 0.96, naive bayes: 0.85, ADA: 0.97, XGB: 
0.96, KNN: 0.98, and Lr: 0.91. The DT classifier 
achieves the highest accuracy of 0.99, attributed to 
its parametric function evaluation and reduced 
misclassification error. Conversely, the NB classifier 
exhibits the lowest accuracy of 0.85, which can be 
attributed to misclassification errors and limited 
parameter usage in the evaluation process. Based on 
the accuracy values, the DT classifier emerges as the 
most effective approach for network intrusion 
detection and classification. Future work entails 
evaluating the model on additional datasets to 
compare results and further enhance the accuracy 
rate. 
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